g02gbf
g02gbf
© Numerical Algorithms Group, 2002.
Purpose
G02GBF Fits a generalized linear model with binomial errors
Synopsis
[dev,idf,b,irank,se,cov,v,ifail] = g02gbf(x,isx,y,t<,link,wt,mean,offset,v,...
tol,maxit,iprint,epslon,weight,ifail>)
Description
A generalized linear model with binomial errors consists of the
following elements:
(a) a set of n observations, y , from a binomial distribution:
i
(t) y t-y
(y)(pi) (1-(pi)) .
(b) X, a set of p independent variables for each observation,
x ,x ,...,x .
1 2 p
(c) a linear model:
--
(eta)= > (beta) x .
-- j j
(d) a link between the linear predictor, (eta), and the mean of
the distribution, (mu)=(pi)t, the link function,
(eta)=g((mu)). The possible links functions are:
( (mu) )
(i) logistic link: (eta)=log( ------),
( t-(mu))
-1( (mu))
(ii) probit link: (eta)=(Phi) ( ----),
( t )
( ( (mu)))
(iii) complementary log-log link: log(-log(1- ----)).
( ( t ))
(e) a measure of fit, the deviance:
n n { ( y ) (
-- ^^^^ -- { ( i ) (
> dev(y ,(mu) )= > 2{y log( -----)+(t -y )log(
-- i i -- { i ( ^^^^ ) i i (
i=1 i=1 { ( (mu) ) (
{ ( i) (
(t -y ) )}
i i )}
----------)}
^^^^ )}
(t -(mu) ))}
i i )}
The linear parameters are estimated by iterative weighted least-
squares. An adjusted dependent variable, z, is formed:
d(eta)
z=(eta)+(y-(mu)) ------
d(mu)
and a working weight, w,
_____________
( d(eta))2 / t
w=((tau) ------) where (tau)= / ------------
( d(mu) ) \/ (mu)(t-(mu))
At each iteration an approximation to the estimate of (beta),
^^^^^^
(beta), is found by the weighted least-squares regression of z on
X with weights w.
1/2 1/2
G02GBF finds a QR decomposition of w X, i.e.,w X=QR where R
is a p by p triangular matrix and Q is an n by p column
orthogonal matrix.
^^^^^^
If R is of full rank, then (beta) is the solution to:
^^^^^^ T 1/2
R(beta)=Q w z
If R is not of full rank a solution is obtained by means of a
singular value decomposition (SVD) of R.
(D 0) T
R=Q (0 0)P ,
*
where D is a k by k diagonal matrix with non-zero diagonal
1/2
elements, k being the rank of R and w X.
This gives the solution
(Q 0)
^^^^^^ -1( * ) T 1/2
(beta)=P D (0 I )Q w z
1
P being the first k columns of P, i.e., P=(P P ).
1 1 0
The iterations are continued until there is only a small change
in the deviance.
The initial values for the algorithm are obtained by taking
^^^^^
(eta)=g(y)
The fit of the model can be assessed by examining and testing the
deviance, in particular by comparing the difference in deviance
between nested models, i.e., when one model is a sub-model of the
other. The difference in deviance between two nested models has,
2
asymptotically, a (chi) distribution with degrees of freedom
given by the difference in the degrees of freedom associated with
the two deviances.
^^^^^^
The parameters estimates, (beta), are asymptotically Normally
distributed with variance-covariance matrix:
T
-1 -1
C=R R in the full rank case, otherwise
-2 T
C=P D P
1 1
The residuals and influence statistics can also be examined.
^^^^^ ^^^^^^
The estimated linear predictor (eta)=X(beta), can be written as
1/2
Hw z for an n by n matrix H. The ith diagonal elements of H, h
i
give a measure of the influence of the ith values of the
independent variables on the fitted regression model. These are
sometimes known as leverages.
^^^^ -1 ^^^^^
The fitted values are given by (mu)=g ((eta)).
G02GBF also computes the deviance residuals, r:
_____________
^^^^ / ^^^^
r =sign(y -(mu) ) / dev(y ,(mu) )
i i i \/ i i
An option allows the use of prior weights in the model.
In many linear regression models the first term is taken as a
mean term or an intercept, i.e., x ,1 = 1, for i=1,2,...,n. This
i
is provided as an option.
Often only some of the possible independent variables are
included in a model, the facility to select variables to be
included in the model is provided.
If part of the linear predictor can be represented by variables
with a known coefficient then this can be included in the model
by using an offset, o:
--
(eta)=o+ > (beta) x .
-- j j
If the model is not of full rank the solution given will be only
one of the possible solutions. Only certain linear combinations
of the parameters will have unique estimates, these are known as
estimable functions.
Details of the SVD, are made available, in the form of the matrix
*
P :
( -1 T)
(D P )
( 1)
* ( T )
P =( P ).
( 0 )
Parameters
g02gbf
Required Input Arguments:
x (:,:) real
isx (:) integer
y (:) real
t (:) real
Optional Input Arguments: <Default>
link (1) string 'g'
wt (:) real zeros(length(y),1)
mean (1) string 'm'
offset (1) string 'n'
v (:,:) real zeros(length(y),ip+7)
tol real 10*eps
maxit integer 10
iprint integer -1
epslon real eps
weight (1) string g02gbf04(wt)
ifail integer -1
Output Arguments:
dev real
idf integer
b (:) real
irank integer
se (:) real
cov (:) real
v (:,:) real
ifail integer